Comparative Genomic Hybridization Assays Using Immobilized Oligonucleotide Features and Compositions for Practicing the Same

ABSTRACT

Comparative genomic hybridization assays and compositions for use in practicing the same are provided. A characteristic of the subject comparative genomic hybridization assays is that solid support immobilized oligonucleotide feature elements, e.g., in the form of an array, are employed. Specifically, at least first and second nucleic acid populations prepared from genomic templates are contacted with a plurality of distinct oligonucleotide feature elements immobilized on a solid support surface and the binding of the at least first and second populations is then evaluated. Also provided are kits for use in practicing the subject methods.

CROSS-REFERENCE TO RELATED APPLICATIONS

Pursuant to 35 U.S.C. §119 (e), this application claims priority to thefiling date of the U.S. Provisional Patent Application Ser. No.60/436,053 filed Dec. 23, 2002; the disclosure of which is hereinincorporated by reference.

INTRODUCTION

1. Technical Field

The technical field of this invention is comparative genomichybridization (CGH).

2. Background of the Invention

Many genomic and genetic studies are directed to the identification ofdifferences in gene dosage or expression among cell populations for thestudy and detection of disease. For example, many malignancies involvethe gain or loss of DNA sequences resulting in activation of oncogenesor inactivation of tumor suppressor genes. Identification of the geneticevents leading to neoplastic transformation and subsequent progressioncan facilitate efforts to define the biological basis for disease,improve prognostication of therapeutic response, and permit earliertumor detection. In addition, perinatal genetic problems frequentlyresult from loss or gain of chromosome segments such as trisomy 21 orthe micro deletion syndromes. Thus, methods of prenatal detection ofsuch abnormalities can be helpful in early diagnosis of disease.

Comparative genomic hybridization (CGH) is one approach that has beenemployed to detect the presence and identify the location of amplifiedor deleted sequences. In one implementation of CGH, genomic DNA isisolated from normal reference cells, as well as from test cells (e.g.,tumor cells). The two nucleic acids are differentially labeled and thensimultaneously hybridized in situ to metaphase chromosomes of areference cell. Chromosomal regions in the test cells which are atincreased or decreased copy number can be identified by detectingregions where the ratio of signal from the two DNAs is altered. Forexample, those regions that have been decreased in copy number in thetest cells will show relatively lower signal from the test DNA than thereference compared to other regions of the genome. Regions that havebeen increased in copy number in the test cells will show relativelyhigher signal from the test DNA.

In a recent variation of the above traditional CGH approach, thechromosomes to which the labeled nucleic acids are hybridized have beenreplaced with a collection of solid support bound nucleic acids, e.g.,an array of BAC (bacterial artificial chromosome) clones or cDNAs. Suchapproaches offer benefits over immobilized chromosome approaches,including a higher resolution, as defined by the ability of the assay tolocalize chromosomal alterations to specific areas of the genome.However, these methods still have significant limitations in theirability to detect chromosomal alterations at single gene resolution (inthe case of BAC clone arrays) or in non-coding regions of the genome inthe case of cDNA clone arrays. In addition, array features containinglonger lengths of nucleic acid sequence are more susceptible to bindingcross-hybridizing sequences, where a given immobilized nucleic acidhybridizes to more than one distinct sequences in solution. Thisproperty limits somewhat the ability of these technologies to detect lowlevel amplifications and deletions sensitively and accurately.

In an effort to address at least some of the above disadvantagesassociated with the use of cDNA arrays in CGH applications, thesuggestion has been made to employ arrays of oligonucleotides instead ofcDNA arrays. See specifically U.S. Pat. No. 6,465,182. However, whileU.S. Pat. No. 6,465,182 suggests CGH methods that employ oligonucleotidearrays, it also teaches that one must reduce the complexity of thesample in order to use such arrays.

There are situations where one wishes to screen a sample of non-reducedcomplexity, e.g., a labeled sample whose complexity is substantially thesame, if not the same, as the genomic nucleic acid source from which ithas been produced. Accordingly, there is interest in the development ofimproved array based CGH methods, particularly methods that employoligonucleotide arrays to screen samples of non-reduced complexity. Thepresent invention satisfies this need.

RELEVANT LITERATURE

United States patents of interest include: U.S. Pat. Nos. 6,465,182;6,335,167; 6,251,601; 6,210,878; 6,197,501; 6,159,685; 5,965,362;5,830,645; 5,665,549; 5,447,841 and 5,348,855. Also of interest arepublished United States Application Serial No. 2002/0006622 andpublished PCT application WO 99/23256. Articles of interest include:Kallioniemi et al, Science (1992) 258:818-21; Pinkel et al., Nat. Genet.(1998) 20:207-11; Nat. Genet. (1999)23:41-6; and Science (1995) 270:467-470; and Pollack et al., Proc. Nat'l Acad. Sci, USA (Oct. 1, 2002)99:12963-12968. Also of interest is the following poster abstract:Baldocchi et al., “Oligonucleotide-array-based comparative genomichybridization,” The Microarray Meeting, Scottsdale Ariz., Sep. 22-25,1999.

SUMMARY OF THE INVENTION

Comparative genomic hybridization assays and compositions for use inpracticing the same are provided. A characteristic of the subjectcomparative genomic hybridization assays is that solid supportimmobilized oligonucleotide features, e.g., in the form of an array, areemployed. Specifically, at least first and second nucleic acidpopulations prepared from different genomic sources are contacted with aplurality of oligonucleotide features immobilized on a solid supportsurface and the binding of the at least first and second populations isthen evaluated. Also provided are kits for use in practicing the subjectmethods.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows an exemplary substrate carrying an array, such as may beused in the devices of the subject invention.

FIG. 2 shows an enlarged view of a portion of FIG. 1 showing spots orfeatures.

FIG. 3 is an enlarged view of a portion of the substrate of FIG. 1.

FIGS. 4 to 7 provide graphical results of various experiments reportedin the Experimental section, below.

DEFINITIONS

The term “oligomer” is used herein to indicate a chemical entity thatcontains a plurality of monomers. As used herein, the terms “oligomer”and “polymer” are used interchangeably. Examples of oligomers andpolymers include polydeoxyribonucleotides (DNA), polyribonucleotides(RNA), other nucleic acids that are C-glycosides of a purine orpyrimidine base, polypeptides (proteins) or polysaccharides (starches,or polysugars), as well as other chemical entities that containrepeating units of like chemical structure.

The term “nucleic acid” as used herein means a polymer composed ofnucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compoundsproduced synthetically (e.g., PNA as described in U.S. Pat. No.5,948,902 and the references cited therein) which can hybridize withnaturally occurring nucleic acids in a sequence specific manneranalogous to that of two naturally occurring nucleic acids, e.g., canparticipate in Watson-Crick base pairing interactions.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymercomposed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean apolymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single strandednucleotide multimers of from about 10 to 100 nucleotides and up to 200nucleotides in length.

The term “functionalization” as used herein relates to modification of asolid substrate to provide a plurality of functional groups on thesubstrate surface. By a “functionalized surface” is meant a substratesurface that has been modified so that a plurality of functional groupsare present thereon.

The terms “reactive site”, “reactive functional group” or “reactivegroup” refer to moieties on a monomer, polymer or substrate surface thatmay be used as the starting point in a synthetic organic process. Thisis contrasted to “inert” hydrophilic groups that could also be presenton a substrate surface, e.g., hydrophilic sites associated withpolyethylene glycol, a polyamide or the like.

The term “sample” as used herein relates to a material or mixture ofmaterials, typically, although not necessarily, in fluid form,containing one or more components of interest.

The terms “nucleoside” and “nucleotide” are intended to include thosemoieties that contain not only the known purine and pyrimidine bases,but also other heterocyclic bases that have been modified. Suchmodifications include methylated purines or pyrimidines, acylatedpurines or pyrimidines, alkylated riboses or other heterocycles. Inaddition, the terms “nucleoside” and “nucleotide” include those moietiesthat contain not only conventional ribose and deoxyribose sugars, butother sugars as well. Modified nucleosides or nucleotides also includemodifications on the sugar moiety, e.g., wherein one or more of thehydroxyl groups are replaced with halogen atoms or aliphatic groups, orare functionalized as ethers, amines, or the like.

The phrase “oligonucleotide bound to a surface of a solid support”refers to an oligonucleotide or mimetic thereof, e.g., PNA, that isimmobilized on a surface of a solid substrate in a feature or spot,where the substrate can have a variety of configurations, e.g., a sheet,bead, or other structure. In certain embodiments, the collections offeatures of oligonucleotides employed herein are present on a surface ofthe same planar support, e.g., in the form of an array.

The term “array” encompasses the term “microarray” and refers to anordered array presented for binding to nucleic acids and the like.Arrays, as described in greater detail below, are generally made up of aplurality of distinct or different features. The term “feature” is usedinterchangeably herein with the terms: “features,” “feature elements,”“spots,” “addressable regions,” “regions of different moieties,”“surface or substrate immobilized elements” and “array elements,” whereeach feature is made up of oligonucleotides bound to a surface of asolid support, also referred to as substrate immobilized nucleic acids.

An “array,” includes any one-dimensional, two-dimensional orsubstantially two-dimensional (as well as a three-dimensional)arrangement of addressable regions (i.e., features, e.g., in the form ofspots) bearing nucleic acids, particularly oligonucleotides or syntheticmimetics thereof (i.e., the oligonucleotides defined above), and thelike. Where the arrays are arrays of nucleic acids, the nucleic acidsmay be adsorbed, physisorbed, chemisorbed, or covalently attached to thearrays at any point or points along the nucleic acid chain.

Any given substrate may carry one, two, four or more arrays disposed ona front surface of the substrate. Depending upon the use, any or all ofthe arrays may be the same or different from one another and each maycontain multiple spots or features. A typical array may contain one ormore, including m more than two, more than ten, more than one hundred,more than one thousand, more ten thousand features, or even more thanone hundred thousand features, in an area of less than 20 cm² or evenless than 10 cm², e.g., less than about 5 cm², including less than about1 cm², less than about 1 mm², e.g., 100μ², or even smaller. For example,features may have widths (that is, diameter, for a round spot) in therange from a 10 μm to 1.0 cm. In other embodiments each feature may havea width in the range of 1.0 μm to 1.0 mm, usually 5.0 μm to 500 μm, andmore usually 10 μm to 200 μm. Non-round features may have area rangesequivalent to that of circular features with the foregoing width(diameter) ranges. At least some, or all, of the features are ofdifferent compositions (for example, when any repeats of each featurecomposition are excluded the remaining features may account for at least5%, 10%, 20%, 50%, 95%, 99% or 100% of the total number of features).Inter-feature areas will typically (but not essentially) be presentwhich do not carry any nucleic acids (or other biopolymer or chemicalmoiety of a type of which the features are composed). Such inter-featureareas typically will be present where the arrays are formed by processesinvolving drop deposition of reagents but may not be present when, forexample, photolithographic array fabrication processes are used. It willbe appreciated though, that the inter-feature areas, when present, couldbe of various sizes and configurations.

Each array may cover an area of less than 200 cm², or even less than 50cm², 5 cm², 1 cm², 0.5 cm², or 0.1 cm². In certain embodiments, thesubstrate carrying the one or more arrays will be shaped generally as arectangular solid (although other shapes are possible), having a lengthof more than 4 mm and less than 150 mm, usually more than 4 mm and lessthan 80 mm, more usually less than 20 mm; a width of more than 4 mm andless than 150 mm, usually less than 80 mm and more usually less than 20mm; and a thickness of more than 0.01 mm and less than 5.0 mm, usuallymore than 0.1 mm and less than 2 mm and more usually more than 0.2 andless than 1.5 mm, such as more than about 0.8 mm and less than about 1.2mm. With arrays that are read by detecting fluorescence, the substratemay be of a material that emits low fluorescence upon illumination withthe excitation light. Additionally in this situation, the substrate maybe relatively transparent to reduce the absorption of the incidentilluminating laser light and subsequent heating if the focused laserbeam travels too slowly over a region. For example, the substrate maytransmit at least 20%, or 50% (or even at least 70%, 90%, or 95%), ofthe illuminating light incident on the front as may be measured acrossthe entire integrated spectrum of such illuminating light oralternatively at 532 nm or 633 nm.

Arrays can be fabricated using drop deposition from pulse-jets of eithernucleic acid precursor units (such as monomers) in the case of in situfabrication, or the previously obtained nucleic acid. Such methods aredescribed in detail in, for example, the previously cited referencesincluding U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat.No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S.patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren etal., and the references cited therein. As already mentioned, thesereferences are incorporated herein by reference. Other drop depositionmethods can be used for fabrication, as previously described herein.Also, instead of drop deposition methods, photolithographic arrayfabrication methods may be used. Inter-feature areas need not be presentparticularly when the arrays are made by photolithographic methods asdescribed in those patents.

In certain embodiments of particular interest, in situ prepared arraysare employed. In situ prepared oligonucleotide arrays, e.g., nucleicacid arrays, may be characterized by having surface properties of thesubstrate that differ significantly between the feature andinter-feature areas. Specifically, such arrays may have high surfaceenergy, hydrophilic features and hydrophobic, low surface energyhydrophobic interfeature regions. Whether a given region, e.g., featureor interfeature region, of a substrate has a high or low surface energycan be readily determined by determining the regions “contact angle”with water, as known in the art and further described in copendingapplication Ser. No. 10/449,838, the disclosure of which is hereinincorporated by reference. Other features of in situ prepared arraysthat make such array formats of particular interest in certainembodiments of the present invention include, but are not limited to:feature density, oligonucleotide density within each feature, featureuniformity, low intra-feature background, low inter-feature background,e.g., due to hydrophobic interfeature regions, fidelity ofoligonucleotide elements making up the individual features,array/feature reproducibility, and the like. The above benefits of insitu produced arrays assist in maintaining adequate sensitivity whileoperating under stringency conditions required to accommodate highlycomplex samples.

An array is “addressable” when it has multiple regions of differentmoieties, i.e., features (e.g., each made up of differentoligonucleotide sequences) such that a region (i.e., a “feature” or“spot” of the array) at a particular predetermined location (i.e., an“address”) on the array will detect a particular solution phase nucleicacid sequence. Array features are typically, but need not be, separatedby intervening spaces.

An exemplary array is shown in FIGS. 1-3, where the array shown in thisrepresentative embodiment includes a contiguous planar substrate 110carrying an array 112 disposed on a rear surface 111 b of substrate 110.It will be appreciated though, that more than one array (any of whichare the same or different) may be present on rear surface 111 b, with orwithout spacing between such arrays. That is, any given substrate maycarry one, two, four or more arrays disposed on a front surface of thesubstrate and depending on the use of the array, any or all of thearrays may be the same or different from one another and each maycontain multiple spots or features. The one or more arrays 112 usuallycover only a portion of the rear surface 111 b, with regions of the rearsurface 111 b adjacent the opposed sides 113 c, 113 d and leading end113 a and trailing end 113 b of slide 110, not being covered by anyarray 112. A front surface 111 a of the slide 110 does not carry anyarrays 112. Each array 112 can be designed for testing against any typeof sample, whether a trial sample, reference sample, a combination ofthem, or a known mixture of biopolymers such as polynucleotides.Substrate 110 may be of any shape, as mentioned above.

As mentioned above, array 112 contains multiple spots or features 116 ofoligomers, e.g., in the form of polynucleotides, and specificallyoligonucleotides. As mentioned above, all of the features 116 may bedifferent, or some or all could be the same. The interfeature areas 117could be of various sizes and configurations. Each feature carries apredetermined oligomer such as a predetermined polynucleotide (whichincludes the possibility of mixtures of polynucleotides). It will beunderstood that there may be a linker molecule (not shown) of any knowntypes between the rear surface 111 b and the first nucleotide.

Substrate 110 may carry on front surface 111 a, an identification code,e.g., in the form of bar code (not shown) or the like printed on asubstrate in the form of a paper label attached by adhesive or anyconvenient means. The identification code contains information relatingto array 112, where such information may include, but is not limited to,an identification of array 112, i.e., layout information relating to thearray(s), etc.

In the case of an array in the context of the present application, the“target” may be referenced as a moiety in a mobile phase (typicallyfluid), to be detected by “probes” which are bound to the substrate atthe various regions.

A “scan region” refers to a contiguous (preferably, rectangular) area inwhich the array spots or features of interest, as defined above, arefound or detected. Where fluorescent labels are employed, the scanregion is that portion of the total area illuminated from which theresulting fluorescence is detected and recorded. Where other detectionprotocols are employed, the scan region is that portion of the totalarea queried from which resulting signal is detected and recorded. Forthe purposes of this invention and with respect to fluorescent detectionembodiments, the scan region includes the entire area of the slidescanned in each pass of the lens, between the first feature of interest,and the last feature of interest, even if there exist intervening areasthat lack features of interest.

An “array layout” refers to one or more characteristics of the features,such as feature positioning on the substrate, one or more featuredimensions, and an indication of a moiety at a given location.“Hybridizing” and “binding”, with respect to nucleic acids, are usedinterchangeably.

By “remote location,” it is meant a location other than the location atwhich the array is present and hybridization occurs. For example, aremote location could be another location (e.g., office, lab, etc.) inthe same city, another location in a different city, another location ina different state, another location in a different country, etc. Assuch, when one item is indicated as being “remote” from another, what ismeant is that the two items are at least in different rooms or differentbuildings, and may be at least one mile, ten miles, or at least onehundred miles apart. “Communicating” information references transmittingthe data representing that information as electrical signals over asuitable communication channel (e.g., a private or public network).“Forwarding” an item refers to any means of getting that item from onelocation to the next, whether by physically transporting that item orotherwise is (where that is possible) and includes, at least in the caseof data, physically transporting a medium carrying the data orcommunicating the data. An array “package” may be the array plus only asubstrate on which the array is deposited, although the package mayinclude other features (such as a housing with a chamber). A “chamber”references an enclosed volume (although a chamber may be accessiblethrough one or more ports). It will also be appreciated that throughoutthe present application, that words such as “top,” “upper,” and “lower”are used in a relative sense only.

The term “stringent assay conditions” as used herein refers toconditions that are compatible to produce binding pairs of nucleicacids, e.g., surface bound and solution phase nucleic acids, ofsufficient complementarity to provide for the desired level ofspecificity in the assay while being less compatible to the formation ofbinding pairs between binding members of insufficient complementarity toprovide for the desired specificity. Stringent assay conditions are thesummation or combination (totality) of both hybridization and washconditions.

A “stringent hybridization” and “stringent hybridization washconditions” in the context of nucleic acid hybridization (e.g., as inarray, Southern or Northern hybridizations) are sequence dependent, andare different under different experimental parameters. Stringenthybridization conditions that can be used to identify nucleic acidswithin the scope of the invention can include, e.g., hybridization in abuffer comprising 50% formamide, 5×SSC, and 1% SDS at 42° C., orhybridization in a buffer comprising 5×SSC and 1% SDS at 65° C., bothwith a wash of 0.2×SSC and 0.1% SDS at 65° C. Exemplary stringenthybridization conditions can also include a hybridization in a buffer of40% formamide, 1 M NaCl, and 1% SDS at 37° C., and a wash in 1×SSC at45° C. Alternatively, hybridization to filter-bound DNA in 0.5 M NaHPO₄,7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C., and washing in0.1×SSC/0.1% SDS at 68° C. can be employed. Yet additional stringenthybridization conditions include hybridization at 60° C. or higher and3×SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42°C. in a solution containing 30% formamide, 1 M NaCl, 0.5% sodiumsarcosine, 50 mM MES, pH 6.5. Those of ordinary skill will readilyrecognize that alternative but comparable hybridization and washconditions can be utilized to provide conditions of similar stringency.

In certain embodiments, the stringency of the wash conditions that setforth the conditions which determine whether a nucleic acid isspecifically hybridized to a surface bound nucleic acid. Wash conditionsused to identify nucleic acids may include, e.g.: a salt concentrationof about 0.02 molar at pH 7 and a temperature of at least about 50° C.or about 55° C. to about 60° C.; or, a salt concentration of about 0.15M NaCl at 72° C. for about 15 minutes; or, a salt concentration of about0.2×SSC at a temperature of at least about 50° C. or about 55° C. toabout 60° C. for about 15 to about 20 minutes; or, the hybridizationcomplex is washed twice with a solution with a salt concentration ofabout 2×SSC containing 0.1% SDS at room temperature for 15 minutes andthen washed twice by 0.1×SSC containing 0.1% SDS at 68° C. for 15minutes; or, equivalent conditions. Stringent conditions for washing canalso be, e.g., 0.2×SSC/0.1% SDS at 42° C.

A specific example of stringent assay conditions is rotatinghybridization at 65° C. in a salt based hybridization buffer with atotal monovalent cation concentration of 1.5 M (e.g., as described inU.S. patent application Ser. No. 09/655,482 filed on Sep. 5, 2000, thedisclosure of which is herein incorporated by reference) followed bywashes of 0.5×SSC and 0.1×SSC at room temperature.

Stringent assay conditions are hybridization conditions that are atleast as stringent as the above representative conditions, where a givenset of conditions are considered to be at least as stringent ifsubstantially no additional binding complexes that lack sufficientcomplementarity to provide for the desired specificity are produced inthe given set of conditions as compared to the above specificconditions, where by “substantially no more” is meant less than about5-fold more, typically less than about 3-fold more. Other stringenthybridization conditions are known in the art and may also be employed,as appropriate.

Sensitivity is a term used to refer to the ability of a given assay todetect a given analyte in a sample, e.g., a nucleic acid species ofinterest. For example, an assay has high sensitivity if it can detect asmall concentration of analyte molecules in sample. Conversely, a givenassay has low sensitivity if it only detects a large concentration ofanalyte molecules (i.e., specific solution phase nucleic acids ofinterest) in sample. A given assay's sensitivity is dependent on anumber of parameters, including specificity of the reagents employed(e.g., types of labels, types of binding molecules, etc.), assayconditions employed, detection protocols employed, and the like. In thecontext of array hybridization assays, such as those of the presentinvention, sensitivity of a given assay may be dependent upon one ormore of: the nature of the surface immobilized nucleic acids, the natureof the hybridization and wash conditions, the nature of the labelingsystem, the nature of the detection system, etc.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

Comparative genomic hybridization assays and compositions for use inpracticing the same are provided. A characteristic of the subjectcomparative genomic hybridization assays is that solid supportimmobilized oligonucleotide features, e.g., in the form of an array, areemployed. Specifically, at least first and second nucleic acidpopulations prepared from genomic sources are contacted with a pluralityof distinct oligonucleotide features immobilized on a solid supportsurface and the binding of the at least first and second populations isthen evaluated. Also provided are kits for use in practicing the subjectmethods.

Before the subject invention is described further, it is to beunderstood that the invention is not limited to the particularembodiments of the invention described below, as variations of theparticular embodiments may be made and still fall within the scope ofthe appended claims. It is also to be understood that the terminologyemployed is for the purpose of describing particular embodiments, and isnot intended to be limiting. Instead, the scope of the present inventionwill be established by the appended claims.

In this specification and the appended claims, the singular forms “a,”“an” and “the” include plural reference unless the context clearlydictates otherwise. Unless defined otherwise, all technical andscientific terms used herein have the same meaning as commonlyunderstood to one of ordinary skill in the art to which this inventionbelongs.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimit of that range, and any other stated or intervening value in thatstated range, is encompassed within the invention. The upper and lowerlimits of these smaller ranges may independently be included in thesmaller ranges, and are also encompassed within the invention, subjectto any specifically excluded limit in the stated range. Where the statedrange includes one or both of the limits, ranges excluding either orboth of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood to one of ordinary skill inthe art to which this invention belongs. Although any methods, devicesand materials similar or equivalent to those described herein can beused in the practice or testing of the invention, the preferred methods,devices and materials are now described.

All publications mentioned herein are incorporated herein by referencefor the purpose of describing and disclosing the invention componentsthat are described in the publications that might be used in connectionwith the presently described invention.

As summarized above, the present invention provides methods forcomparing populations of nucleic acids, e.g., in array ComparativeGenomic Hybridization (aCGH) applications, and compositions for usetherein. In further describing the present invention, the subjectmethods are discussed first in greater detail, followed by a review ofrepresentative kits for use in practicing the subject methods.

Methods

The subject invention provides methods for comparing populations ofnucleic acids and compositions for use therein, where a characteristicof the subject methods is the use of a population of distinct substrateimmobilized oligonucleotide features, e.g., an array of substrateimmobilized oligonucleotide features.

In practicing the subject methods, the first step is to provide at leasttwo different populations or collections of nucleic acids that are to becompared. The two or more populations of nucleic acids may or may not belabeled, depending on the particular detection protocol employed in agiven assay. For example, in certain embodiments, binding events on thesurface of a substrate may be detected by means other than by detectionof a labeled nucleic acids, such as by change in conformation of aconformationally labeled immobilized oligonucleotide, detection ofelectrical signals caused by binding events on the substrate surface,etc. In many embodiments, however, the populations of nucleic acids arelabeled, where the populations may be labeled with the same label ordifferent labels, depending on the actual assay protocol employed. Forexample, where each population is to be contacted with different butidentical arrays, each nucleic acid population or collection may belabeled with the same label. Alternatively, where both populations areto be simultaneously contacted with a single array of immobilizedoligonucleotide features, i.e., cohybridized to the same array ofimmobilized nucleic acid feature, solution-phase collections orpopulations of nucleic acids that are to be compared are generallydistinguishably or differentially labeled with respect to each other.

The two or more (i.e., at least first and second, where the number ofdifferent collections may, in certain embodiments, be three, four ormore) populations of nucleic acids are prepared from different genomicsources. As such, the first step in many embodiments of the subjectmethods is to prepare a collection of nucleic acids, e.g., labelednucleic acids, from an initial genomic source for each genome that is tobe compared.

The term genome refers to all nucleic acid sequences (coding andnon-coding) and elements present in or originating from any virus,single cell (prokaryote and eukaryote) or each cell type and theirorganelles (e.g. mitochondria) in a metazoan organism. The term genomealso applies to any naturally occurring or induced variation of thesesequences that may be present in a mutant or disease variant of anyvirus or cell type. These sequences include, but are not limited to,those involved in the maintenance, replication, segregation, and higherorder structures (e.g. folding and compaction of DNA in chromatin andchromosomes), or other functions, if any, of the nucleic acids as wellas all the coding regions and their corresponding regulatory elementsneeded to produce and maintain each particle, cell or cell type in agiven organism.

For example, the human genome consists of approximately 3×10⁹ base pairsof DNA organized into distinct chromosomes. The genome of a normaldiploid somatic human cell consists of 22 pairs of autosomes(chromosomes 1 to 22) and either chromosomes X and Y (males) or a pairof chromosome Xs (female) for a total of 46 chromosomes. A genome of acancer cell may contain variable numbers of each chromosome in additionto deletions, rearrangements and amplification of any subchromosomalregion or DNA sequence.

By “genomic source” is meant the initial nucleic acids that are used asthe original nucleic acid source from which the solution phase nucleicacids are produced, e.g., as a template in the labeled solution phasenucleic acid generation protocols described in greater detail below.

The genomic source may be prepared using any convenient protocol. Inmany embodiments, the genomic source is prepared by first obtaining astarting composition of genomic DNA, e.g., a nuclear fraction of a celllysate, where any convenient means for obtaining such a fraction may beemployed and numerous protocols for doing so are well known in the art.The genomic source is, in many embodiments of interest, genomic DNArepresenting the entire genome from a particular organism, tissue orcell type. However, in certain embodiments the genomic source maycomprise a portion of the genome, e.g., one or more specific chromosomesor regions thereof, such as PCR amplified regions produced with a pairsof specific primers.

A given initial genomic source may be prepared from a subject, forexample a plant or an animal, which subject is suspected of beinghomozygous or heterozygous for a deletion or amplification of a genomicregion. In certain embodiments, the average size of the constituentmolecules that make up the initial genomic source typically have anaverage size of at least about 1 Mb, where a representative range ofsizes is from about 50 to about 250 Mb or more, while in otherembodiments, the sizes may not exceed about 1 MB, such that they may beabout 1 Mb or smaller, e.g., less than about 500 Kb, etc.

In certain embodiments, the genomic source is “mammalian”, where thisterm is used broadly to describe organisms which are within the classmammalia, including the orders carnivore (e.g., dogs and cats), rodentia(e.g., mice, guinea pigs, and rats), and primates (e.g., humans,chimpanzees, and monkeys), where of particular interest in certainembodiments are human or mouse genomic sources. In certain embodiments,a set of nucleic acid sequences within the genomic source is complex, asthe genome containsat least about 1×10⁸ base pairs, including at leastabout 1×10⁹ base pairs, e.g., about 3×10⁹ base pairs.

Where desired, the initial genomic source may be fragmented in thegeneration protocol, as desired, to produce a fragmented genomic source,where the molecules have a desired average size range, e.g., up to about10 Kb, such as up to about 1 Kb, where fragmentation may be achievedusing any convenient protocol, including but not limited to: mechanicalprotocols, e.g., sonication, shearing, etc., chemical protocols, e.g.,enzyme digestion, etc.

Where desired, the initial genomic source may be amplified as part ofthe solution phase nucleic acid generation protocol, where theamplification may or may not occur prior to any fragmentation step. Inthose embodiments where the produced collection of nucleic acids hassubstantially the same complexity as the initial genomic source fromwhich it is prepared, the amplification step employed is one that doesnot reduce the complexity, e.g., one that employs a set of randomprimers, as described below. For example, the initial genomic source mayfirst be amplified in a manner that results in an amplified version ofvirtually the whole genome, if not the whole genome, before labeling,where the fragmentation, if employed, may be performed pre- orpost-amplification.

Following provision of the initial genomic source, and any initialprocessing steps (e.g., fragmentation, amplification, etc.) as describedabove, the collection of solution phase nucleic acids is prepared foruse in the subject methods. In certain embodiments of particularinterest, the collection of solution phase nucleic acids prepared fromthe initial genomic source is one that has substantially the samecomplexity as the complexity of the initial genomic source. Complexity,as used in describing the product nucleic acid collection/population,refers to the number of distinct or different nucleic acid sequencesfound in a collection of nucleic acids relative to the number ofdistinct or different nucleic acid sequences found in the genomicsource.

The prepared collection of solution phase nucleic acids is a“non-reduced-complexity” collection of solution phase nucleic acids ascompared to the initial genomic source. A non-reduced complexitycollection is one that is not produced in a manner designed to reducethe complexity of the sample. Examples of protocols that can producereduced complexity product compositions of utility in genotyping andgene expression include those described in U.S. Pat. No. 6,465,182 andpublished PCT application WO 99/23256; as well as published U.S. PatentApplication No. 2003/0036069 and Jordan et al., Proc. Nat'l Acad. Sci.USA (Mar. 5, 2002) 99: 2942-2947. In each of these protocols thatproduce a reduced complexity product, primers are employed that havebeen designed to knowingly produce product nucleic acids from only aselect fraction or portion of the initial genomic source, e.g., genome,where fraction or portion may be defined as a subset or representativesubset of a genome.

A product composition is considered to be a non-reduced complexityproduct composition as compared to the initial nucleic acid source fromwhich it is prepared if there is a high probability that a sequence ofspecific length randomly chosen from the sequence of the initial genomicsource is present in the product composition, either in a single nucleicacid member of the product or in a “concatamer” of two different nucleicacid members of the product (i.e., in a virtual molecule produced byjoining two different members to produce a single molecule). In otherwords, if there is a high probably that an N-mer sequence (i.e., asequence of “N” nucleotides) that is randomly chosen from the initialsource has the same sequence as an N-mer within the product composition(either in a single nucleic acid member of the product or in a“concatamer” of two different nucleic acid members of the product), thenthe product composition is considered to be a composition of non-reducedcomplexity as compared to the initial source. In many embodiments, thelength N of the sequence (i.e., N-mer) that is randomly chosen from theinitial source ranges from about 45 to about 200 nt, including fromabout 50 to about 100 nt, such as from about 55 to about 65 nt, e.g., 60nt. For example, if a sequence of 60 nt in length that is randomlychosen uniformly over an initial genomic source sequence has a highprobability of being in the product composition, then the productcomposition has a non-reduced complexity as compared to the parentcomposition. For this purpose, a given sequence is considered to have ahigh probability of being in a product composition if its probability ofbeing in the product composition, either in a single nucleic acid memberor in a concatamer of two different members, is at least about 10%, forexample at least about 25%, including at least about 50%, where incertain embodiments the probability may be about 60%, about 70%, about80%, about 90%, about 95% or higher, e.g., about 98%, etc. Withknowledge of the sequence within the genomic source and product, theprobability that a given sequence randomly chosen from the initialsource is present in a given product composition may be determinedaccording to the following parameters:

Consider a nucleotide sequence of the genomic source: G. Consider afixed integer N. Consider a collection of nucleic acids, M={m₁, m₂, . .. , m_(k)} where each m_(i) is a subsequence of G. For any N-mersequence w, define

${\sigma_{G}(w)} = \left\{ {{\begin{matrix}1 & {w\mspace{14mu} {is}\mspace{14mu} a\mspace{14mu} {subsequence}\mspace{14mu} {of}\mspace{14mu} G} \\0 & {otherwise}\end{matrix}{\sigma_{M}(w)}} = \left\{ \begin{matrix}1 & {w\mspace{14mu} {is}\mspace{14mu} a\mspace{14mu} {subsequence}\mspace{14mu} {of}\mspace{14mu} {some}\mspace{14mu} m_{i}\mspace{14mu} {or}\mspace{14mu} {of}\mspace{14mu} {some}\mspace{14mu} {concatenation}\mspace{14mu} m_{i}*m_{j}} \\0 & {otherwise}\end{matrix} \right.} \right.$

Set

$S_{G} = {\sum\limits_{N - {mers}}{\sigma_{G}(w)}}$

and

$S_{M} = {\sum\limits_{N - {mers}}{\sigma_{M}(w)}}$

Where the sums are over all mathematically possible N-mers. Theprobability that a random N-mer W uniformly selected over G is presentin M is then

$p = {\frac{S_{M}}{S_{G}}.}$

From a practical point of view, the numbers S_(M) and S_(G) can becomputed by stepping along the sequences and incrementing by 1 everytime a new N-mer is visited. Then all pairs of concatemers from M arealso processed in the same way. Given the formulas, this calculation isthen obvious to anyone skilled in the art of programming.

A non-reduced complexity collection of nucleic acids can be readilyidentified using a number of different protocols. One convenientprotocol for determining whether a given collection of nucleic acids isa non-reduced complexity collection of nucleic acids is to screen thecollection using a genome wide array of features for the initial, e.g.,genomic source of interest. Thus, one can tell whether a givencollection of nucleic acids has non-reduced complexity with respect toits genomic source by assaying the collection with a genome wide arrayfor the genomic source. The genome wide array of the genomic source forthis purpose is an array of features in which the collection of featuresof the array used to test the sample is made up of sequences uniformlyand independently randomly chosen from the initial genomic source. Assuch, sequences of sufficient length, e.g., N length as described above,independently chosen randomly from the initial nucleic acid source thatuniformly sample the initial nucleic acid source are present in thecollection of features on the array. By uniformly is meant that no biasis present in the selection of sequences from the initial genomicsource. In such a genome wide assay of sample, a non-reduced complexitysample is one in which substantially all of the array features on thearray specifically hybridize to nucleic acids present in the sample,where by substantially all is meant at least about 10%, for example atleast about 25%, including at least about 50%, such as at least about60, 70, 75, 80, 85, 90 or 95% or more.

As such, according to the above guidelines, a sample is considered to beof non-reduced complexity as compared to its genomic source if itscomplexity is at least about 10%, for example at least about 25%,including at least about 50%, such as at least about 60, 70, 75, 80, 85,90 or 95% or more of the complexity of the genomic source, as detailedabove.

In many embodiments of interest, the collection or population of nucleicacids that is prepared in this step of the subject methods is one thatis labeled with a detectable label. In the embodiments where thepopulation of solution-phase nucleic acids is a non-reduced complexitypopulation of nucleic acids, as described above, the labeled nucleicacids are prepared in a manner that does not reduce the complexity toany significant extent as compared to the initial genomic source. Anumber of different nucleic acid labeling protocols are known in the artand may be employed to produce a population of labeled nucleic acids.The particular protocol may include the use of labeled primers, labelednucleotides, modified nucleotides that can be conjugated with differentdyes, one or more amplification steps, etc.

In one type of representative labeling protocol of interest, the initialgenomic source, which most often is fragmented (as described above), isemployed in the preparation of labeled nucleic acids as a genomictemplate from which the labeled nucleic acids are enzymaticallyproduced. Different types of template dependent labeled nucleic acidgeneration protocols are known in the art. In certain types ofprotocols, the template is employed in a non-amplifying primer extensionnucleic acid generation protocol. In yet other embodiments, the templateis employed in an amplifying primer extension protocol.

Of interest in the embodiments described above, whether they beamplifying or non-amplifying primer extension reactions, is the use of aset of primers that results in the production of the desired nucleicacid collection of high complexity, i.e., comparable or substantiallysimilar complexity to the initial genomic source. In many embodiments,the above described population of nucleic acids in which substantiallyall, if not all, of the sequences found in the initial genomic sourceare present, is produced using a primer mixture of random primers, i.e.,primers of random sequence. The primers employed in the subject methodsmay vary in length, and in many embodiments range in length from about 3to about 25 nt, sometimes from about 5 to about 20 nt and sometimes fromabout 5 to about 10 nt. The total number of random primers of differentsequence that is present in a given population of random primers mayvary, and depends on the length of the primers in the set. As such, inthe sets of random primers, which include all possible variations, thetotal number of primers n in the set of primers that is employed is 4″,where Y is the length of the primers. Thus, where the primer set is madeup of 3-mers, Y=3 and the total number n of random primers in the set is4³ or 64. Likewise, where the primer set is made up of 8-mers, Y=8 andthe total number n of random primers in the set is 4⁸ or 65,536.Typically, an excess of random primers is employed, such that in a givenprimer set employed in the subject invention, multiple copies of eachdifferent random primer sequence is present, and the total number ofprimer molecules in the set far exceeds the total number of distinctprimer sequences, where the total number may range from about 1.0×10¹⁰to about 1.0×10²⁰, such as from about 1.0×10¹³ to about 1.0×10¹⁷, e.g.,3.7×10¹⁵. The primers described above and throughout this specificationmay be prepared using any suitable method, such as, for example, theknown phosphotriester and phosphite triester methods, or automatedembodiments thereof. In one such automated embodiment, dialkylphosphoramidites are used as starting materials and may be synthesizedas described by Beaucage et al. (1981), Tetrahedron Letters 22, 1859.One method for synthesizing oligonucleotides on a modified solid supportis described in U.S. Pat. No. 4,458,066.

As indicated above, in generating labeled nucleic acids according tothese embodiments of subject methods, the above-described genomictemplate and random primer population are employed together in a primerextension reaction that produces the desired labeled nucleic acids.Primer extension reactions for generating labeled nucleic acids are wellknown to those of skill in the art, and any convenient protocol may beemployed, so long as the above described genomic source (being used as atemplate) and population of random primers are employed. In this step ofthe subject methods, the primer is contacted with the template underconditions sufficient to extend the primer and produce a primerextension product, either in an amplifying or in a non-amplifying manner(where a non-amplifying manner is one in which essentially a singleproduct is produced per template strand). As such, the above primers arecontacted with the genomic template in the presence of a sufficient DNApolymerase under primer extension conditions sufficient to produce thedesired primer extension molecules. DNA polymerases of interest include,but are not limited to, polymerases derived from E. coli, thermophilicbacteria, archaebacteria, phage, yeasts, Neurosporas, Drosophilas,primates and rodents. The DNA polymerase extends the primer according tothe genomic template to which it is hybridized in the presence ofadditional reagents which may include, but are not limited to: dNTPs;monovalent and divalent cations, e.g. KCl, MgCl₂; sulfhydryl reagents,e.g. dithiothreitol; and buffering agents, e.g. Tris-Cl.

Extension products that are produced as described above are typicallylabeled in the present methods. As such, the reagents employed in thesubject primer extension reactions typically include a labeling reagent,where the labeling reagent may be the primer or a labeled nucleotide,which may be labeled with a directly or indirectly detectable label. Adirectly detectable label is one that can be directly detected withoutthe use of additional reagents, while an indirectly detectable label isone that is detectable by employing one or more additional reagent,e.g., where the label is a member of a signal producing system made upof two or more components. In many m embodiments, the label is adirectly detectable label, such as a fluorescent label, where thelabeling reagent employed in such embodiments is a fluorescently taggednucleotide(s), e.g., dCTP. Fluorescent moieties which may be used to tagnucleotides for producing labeled nucleic acids include, but are notlimited to: fluorescein, the cyanine dyes, such as Cy3, Cy5, Alexa 555,Bodipy 630/650, and the like. Other labels may also be employed as areknown in the art.

In the primer extension reactions employed in the subject methods ofthese embodiments, the genomic template is typically first subjected tostrand disassociation conditions, e.g., subjected to a temperatureranging from about 80° C. to about 100° C., usually from about 90° C. toabout 95° C. for a period of time, and the resultant disassociatedtemplate molecules are then contacted with the primer molecules underannealing conditions, where the temperature of the template and primercomposition is reduced to an annealing temperature of from about 20° C.to about 80° C., usually from about 37° C. to about 65° C. In certainembodiments, a “snap-cooling” protocol is employed, where thetemperature is reduced to the annealing temperature, or to about 4° C.or below in a period of from about 1s to about 30s, usually from about5s to about 10s.

The resultant annealed primer/template hybrids are then maintained in areaction mixture that includes the above-discussed reagents at asufficient temperature and for a sufficient period of time to producethe desired labeled nucleic acids. Typically, this incubationtemperature ranges from about 20° C. to about 75° C., usually from about37° C. to about 65° C. The incubation time typically ranges from about 5min to about 18 hr, usually from about 1 hr to about 12 hr.

Using the above protocols, at least a first collection of nucleic acidsand a second collection of nucleic acids are produced from two differentgenomic sources, e.g., a reference and test genomic template. Asindicated above, depending on the particular assay protocol (e.g.,whether both populations are to be hybridized simultaneously to a singlearray or whether each population is to be hybridized to two differentbut substantially identical, if not identical, arrays) the populationsmay be labeled with the same or different labels. As such, acharacteristic of certain embodiments is that the different collectionsor populations of produced labeled nucleic acids are all labeled withthe same label, such that they are not distinguishably labeled. In yetother embodiments, a characteristic of the different collections orpopulations of produced labeled nucleic acids is that the first andsecond labels are typically distinguishable from each other. Theconstituent members of the above produced collections typically range inlength from about 100 to about 10,000 nt, such as from about 200 toabout 10,000 nt, including from about 100 to 1,000 nt, from about 100 toabout 500 nt, etc.

In the next step of the subject methods, the collections or populationsof labeled nucleic acids produced by the subject methods are contactedto a plurality of different surface immobilized elements (i.e.,features) under conditions such that nucleic acid hybridization to thesurface immobilized elements can occur. The collections can be contactedto the surface immobilized elements either simultaneously or serially.In many embodiments the compositions are contacted with the plurality ofsurface immobilized elements, e.g., the array of distinctoligonucleotides of different sequence, simultaneously. Depending on howthe collections or populations are labeled, the collections orpopulations may be contacted with the same array or different arrays,where when the collections or populations are contacted with differentarrays, the different arrays are substantially, if not completely,identical to each other in terms of feature content and organization.

A characteristic of the present invention is that the substrateimmobilized nucleic acids that make up the features of the arraysemployed in the subject methods are oligonucleotides. By oligonucleotideis meant a nucleic acid having a length ranging from about 10 to about200 nt including from about 10 or about 20 nt to about 100 nt, where inmany embodiments the immobilized nucleic acids range in length fromabout 50 to about 90 nt or about 50 to about 80 nt, such as from about50 to about 70 nt.

Surface immmobilized nucleic acids that make up the features of thearrays employed in such applications can be derived from virtually anysource. Typically, the nucleic acids will be nucleic acid moleculeshaving sequences derived from representative locations along achromosome of interest, a chromosomal region of interest, an entiregenome of interest, a cDNA library, to and the like.

The choice of surface immobilized nucleic acids to use may be influencedby prior knowledge of the association of a particular chromosome orchromosomal region with certain disease conditions. InternationalApplication WO 93/18186 provides a list of chromosomal abnormalities andassociated diseases, which are described in the scientific literature.Alternatively, whole genome screening to identify new regions subject tofrequent changes in copy number can be performed using the methods ofthe present invention. In these embodiments, surface immobilizedelements or features usually contain nucleic acids representative oflocations distributed over the entire genome. In such embodiments, theresolution may vary, where in certain embodiments, the resolution is atleast about 500 kb, such as at least about 250 kb, at least about 200kb, at least about 150 kb, at least about 100 kb, at least about 50 kb,including at least about 25 kb, at least about 10 kb or higher. Ofinterest in certain embodiments are resolutions ranging from about 20 kbto about 100 kb, such as 30 kb to about 100 kb, including from about 40kb to about 75 kb. By resolution is meant the spacing on the genomebetween sequences found in the surface immobilized elements or features.In some embodiments (e.g., using a large number of features of highcomplexity) all sequences in the genome can be present in the array. Incertain embodiments, the resolution is with respect to at least aportion of the genmome, and may be about every 1 kb, about every 2 kb,about every 5 kb, about every 10 kb, as well as the numbers providedabove. The spacing between different locations of the genome that arerepresented in the features of the collection of features may also vary,and may be uniform, such that the spacing is substantially the same, ifnot the same, between sampled regions, or non-uniform, as desired.

In some embodiments, previously identified regions from a particularchromosomal region of interest are used as array elements. Such regionsare becoming available as a result of rapid progress of the worldwideinitiative in genomics. In certain embodiments, the array can includefeatures made up of surface immobilized oligonucleotides which “tile” aparticular region (which have been identified in a previous assay), bywhich is meant that the features correspond to region of interest aswell as genomic sequences found at defined intervals on either side ofthe particular region of interest, i.e., 5′ and 3′ of, the region ofinterest, where the intervals may or may not be uniform, and may betailored with respect to the particular region of interest and the assayobjective. In other words, the tiling density may be tailored based onthe particular region of interest and the assay objective. Such “tiled”arrays and assays employing the same are useful in a number ofapplications, including applications where one identifies a region ofinterest at a first resolution, and then uses tiled arrays tailored tothe initially identified region to further assay the region at a higherresolution, e.g., in an iterative protocol. Accordingly, the subjectmethods include at least two iterations, where the first iteration ofthe subject methods identifies a region of interest, and the one or moresubsequent iterations assay the region with sets of tiled surfaceimmobilized features, e.g., of increasing or alternate resolution.

Of interest are both coding and non-coding genomic regions, (as well asregions that are transcribed but not translated), where by coding regionis meant a region of one or more exons that is transcribed into an mRNAproduct and from there translated into a protein product, while bynon-coding region is meant any sequences outside of the exon regions,where such regions may include regulatory sequences, e.g., promoters,enhancers, introns, inter-genic regions, etc. In certain embodiments,one can have at least some of the features directed to non-codingregions and others directed to coding regions. In certain embodiments,one can have all of the features directed to non-coding sequences. Incertain embodiments, one can have all of the features directed to, i.e.,corresponding to, coding sequences.

In certain embodiments, the oligonucleotides that make up the distinctfeatures are ones that have been designed according to one or moreparticular parameters to be suitable for use in a given application,where representative parameters include, but are not limited to: length,melting temperature (TM), non-homology with other regions of the genome,signal intensities, kinetic properties under hybridization conditions,etc., see e.g., U.S. Pat. No. 6,251,588, the disclosure of which isherein incorporated by reference. In certain embodiments, the entirelength of the feature oligonucleotides is employed in hybridizing tosequences in the genome, while in other embodiments, only a portion ofthe immobilized oligonucleotide has sequence that hybridizes to sequencefound in the genome of interest, e.g., where a portion of theoligonucleotide serves as a tether. For example, a given oligonucleotidemay include a 30 nt long genome specific sequence linked to a 30 nttether, such that the oligonucleotide is a 60-mer of which only is aportion, e.g., 30 nt long, is genome specific.

The surface immobilized oligonucleotides of the features employed in thesubject methods are immobilized on a solid support. Many methods forimmobilizing nucleic acids on a variety of solid support surfaces areknown in the art. For instance, the solid support may be a membrane,glass, plastic, or a bead. The desired component may be covalently boundor noncovalently attached through nonspecific binding, adsorption,physisorption or chemisorption. The immobilization of nucleic acids onsolid support surfaces is discussed more fully below.

A wide variety of organic and inorganic polymers, as well as othermaterials, both natural and synthetic, may be employed as the materialfor the solid surface. Illustrative solid surfaces includenitrocellulose, nylon, glass, fused silica, diazotized membranes (paperor nylon), silicones, cellulose, and cellulose acetate. In addition,plastics such as polyethylene, polypropylene, polystyrene, and the likecan be used. Other materials that may be employed include paper,ceramics, metals, metalloids, semiconductive materials, cermets or thelike. In addition substances that form gels can be used. Such materialsinclude proteins (e.g., gelatins), lipopolysaccharides, silicates,agarose and polyacrylamides. Where the solid surface is porous, variouspore sizes may be employed depending upon the nature of the system.

As reviewed above, arrays can be fabricated using a variety of differentprotocols. Of interest in certain embodiments are arrays prepared bydrop deposition from pulse-jets of either nucleic acid precursor units(such as monomers) in the case of in situ fabrication, or the previouslyobtained nucleic acid. Such methods are described in detail in, forexample, the previously cited references including U.S. Pat. No.6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat. No. 6,180,351, U.S. Pat.No. 6,171,797, U.S. Pat. No. 6,323,043, U.S. patent application Ser. No.09/302,898 filed Apr. 30, 1999 by Caren et al., and the references citedtherein. As already mentioned, these references are incorporated hereinby reference. Other drop deposition methods can be used for fabrication,as previously described herein. Also, instead of drop depositionmethods, photolithographic array fabrication methods may be used.Inter-feature areas need not be present, particularly when the arraysare made by photolithographic methods as described in those patents. Ofparticular interest in certain embodiments are is arrays produced via insitu preparation protocols.

In the subject methods (as summarized above), the copy number ofparticular nucleic acid sequences in two solution phase collections arecompared by hybridizing the collections to one or more nucleic acid,specifically oligonucleotide, arrays, as described above. Thehybridization signal intensity, and the ratio of intensities, read fromany resultant surface immobilized nucleic acid duplexes (made up ofhybridized feature oligonucleotides and solution phase nucleic acids)produced is determined. Since signal intensities on a feature can beinfluenced by factors other than the copy number of a solution phasenucleic acid population, for certain embodiments an analysis isconducted where two labeled populations are present with distinctlabels. Thus comparison of the signal intensities for a specific surfaceimmobilized elements permits a direct comparison of copy number for agiven sequence. Different surface immobilized elements will reflect thecopy numbers for different sequences in the solution phase populations.The comparison can reveal situations where each sample includes acertain number of copies of a sequence of interest, but the numbers ofcopies in each sample are different. The comparison can also revealsituations where one sample is devoid of any copies of the sequence ofinterest, and the other sample includes one or more copies of thesequence of interest.

Standard hybridization techniques (using high stringency hybridizationand washing conditions) are used to assay a nucleic acid array. Suitablemethods are described in references describing CGH techniques(Kallioniemi et al., Science 258:818-821 (1992) and WO 93/18186).Several guides to general techniques are available, e.g., Tijssen,Hybridization with Nucleic Acid Probes, Parts I and II (Elsevier,Amsterdam 1993). For a descriptions of techniques suitable for in situhybridizations see, Gall et al. Meth. Enzymol., to 21:470-480 (1981) andAngerer et al. in Genetic Engineering: Principles and Methods Setlow andHollaender, Eds. Vol 7, pgs 43-65 (plenum Press, New York 1985). Seealso U.S. Pat. Nos: 6,335,167; 6,197,501; 5,830,645; and 5,665,549; thedisclosures of which are herein incorporate by reference.

Generally, nucleic acid hybridizations comprise the following majorsteps: (1) provision of array of surface immobilized nucleic acids orfeatures; (2) optionally pre-hybridization treatment to increaseaccessibility of features, and to reduce nonspecific binding; (3)hybridization of the mixture of nucleic acids to the features on thesolid surface, typically under high stringency conditions; (4)post-hybridization washes to remove nucleic acid fragments not bound inthe hybridization; and (5) detection of the hybridized nucleic acidfragments. The reagents used in each of these steps and their conditionsfor use vary depending on the particular application.

As indicated above, hybridization is carried out under suitablehybridization conditions, which may vary in stringency as desired. Incertain embodiments, highly stringent hybridization conditions may beemployed. The term “highly stringent hybridization conditions” as usedherein refers to conditions that are compatible to produce nucleic acidbinding complexes on an array surface between complementary bindingmembers, i.e., between immobilized features and complementary solutionphase nucleic acids in a sample. Representative high stringency assayconditions that may be employed in these embodiments are provided above.

The above hybridization step may include agitation of the immobilizedfeatures and the sample of solution phase nucleic acids, where theagitation may be accomplished using any convenient protocol, e.g.,shaking, rotating, spinning, and the like.

Following hybridization, the surface of immobilized nucleic acids istypically washed to remove unbound nucleic acids. Washing may beperformed using any convenient washing protocol, where the washingconditions are typically stringent, as described above.

Following hybridization and washing, as described above, thehybridization of the labeled nucleic acids to the array is then detectedusing standard techniques so that the surface of immobilized features,e.g., array, is read. Reading of the resultant hybridized array may beaccomplished by illuminating the array and reading the location andintensity of resulting fluorescence at each feature of the array todetect any binding complexes on the surface of the array. For example, ascanner may be used for this purpose which is similar to the AGILENTMICROARRAY SCANNER available from Agilent Technologies, Palo Alto,Calif. Other suitable devices and methods are described in U.S. patentapplications: Ser. No. 09/846,125 “Reading Multi-Featured Arrays” byDorsel et al.; and U.S. Pat. No. 6,406,849, which references areincorporated herein by reference. However, arrays may be read by anyother method or apparatus than the foregoing, with other reading methodsincluding other optical techniques (for example, detectingchemiluminescent or electroluminescent labels) or electrical techniques(where each feature is provided with an electrode to detecthybridization at that feature in a manner disclosed in U.S. Pat. No.6,221,583 and elsewhere). In the case of indirect labeling, subsequenttreatment of the array with the appropriate reagents may be employed toenable reading of the array. Some methods of detection, such as surfaceplasmon resonance, do not require any labeling of the nucleic acids, andare suitable for some embodiments.

Results from the reading or evaluating may be raw results (such asfluorescence intensity readings for each feature in one or more colorchannels) or may be processed results, such as obtained by subtracting abackground measurement, or by rejecting a reading for a feature which isbelow a predetermined threshold and/or forming conclusions based on thepattern read from the array (such as whether or not a particular featuresequence may have been present in the sample, or whether or not apattern indicates a particular condition of an organism from which thesample came).

In certain embodiments, the subject methods include a step oftransmitting data or results from at least one of the detecting andderiving steps, also referred to herein as evaluating, as describedabove, to a remote location. By “remote location” is meant a locationother than the location at which the array is present and hybridizationoccur. For example, a remote location could be another location (e.g.office, lab, etc.) in the same city, another location in a differentcity, another location in a different state, another location in adifferent country, etc. As such, when one item is indicated as being“remote” from another, what is meant is that the two items are at leastin different buildings, and may be at least one mile, ten miles, or atleast one hundred miles apart.

“Communicating” information means transmitting the data representingthat information as electrical signals over a suitable communicationchannel (for example, a private or public network). “Forwarding” an itemrefers to any means of getting that item from one location to the next,whether by physically transporting that item or otherwise (where that ispossible) and includes, at least in the case of data, physicallytransporting a medium carrying the data or communicating the data. Thedata may be transmitted to the remote location for further evaluationand/or use. Any convenient telecommunications means may be employed fortransmitting the data, e.g., facsimile, modem, internet, etc.

A feature of the certain embodiments of the above methods is that theyare sufficiently sensitive to detect a single copy number difference orchange in the amount of a sequence of interest between any two givensamples. In other words, the subject methods are capable of detecting asingle copy number variation in a sequence between any two samples. Assuch, the subject methods are highly sensitive methods of comparing thecopy numbers of one or more sequences between two or more samples.

Utility

The above-described methods find use in any application in which onewishes to compare the copy number of nucleic acid sequences found in twoor more populations. One type of representative application in which thesubject methods find use is the quantitative comparison of copy numberof one nucleic acid sequence in a first collection of nucleic acidmolecules relative to the copy number of the same sequence in a secondcollection. The subject methods find use in the detection of bothheterozygous and homozygous deletions of sequences, as well asamplification of sequences, which conditions may be characteristic ofcertain conditions, e.g., disease conditions.

As such, embodiments of the present invention may be used in methods ofcomparing abnormal nucleic acid copy number and mapping of chromosomalabnormalities associated with disease. In certain embodiments, thesubject methods are employed in applications that use nucleic acidsimmobilized on a solid support, to which differentially labeled solutionphase nucleic acids produced as described above are hybridized. Analysisof processed results of the described hybridization experiments providesinformation about the relative copy number of nucleic acid domains, e.g.genes, in genomes.

Such applications compare the copy numbers of sequences capable ofbinding to the features. Variations in copy number detectable by themethods of the invention may arise in different ways. For example, copynumber may be altered as a result of amplification or deletion of achromosomal region, e.g. as commonly occurs in cancer.

Representative applications in which the subject methods find use arefurther described in U.S. Pat. Nos. 6,335,167; 6,197,501; 5,830,645; and5,665,549; the disclosures of which are herein incorporated byreference.

Kits

Also provided are kits for use in the subject invention, where such kitsmay comprise containers, each with one or more of the variousreagents/compositions utilized in the methods, where suchreagents/compositions typically at least include a collection ofimmobilized oligonucleotide features, e.g., one or more arrays ofoligonucleotide featuress, and reagents employed in labeled nucleic acidproduction, e.g., random primers, buffers, the appropriate nucleotidetriphosphates (e.g. dATP, dCTP, dGTP, dTTP), DNA polymerase, labelingreagents, e.g., labeled nucleotides, and the like. Where the kits arespecifically designed for use in CGH applications, the kits may furtherinclude labeling reagents for making two or more collections ofdistinguishably labeled nucleic acids according to the subject methods,an array of features, hybridization solution, etc.

Finally, the kits may further include instructions for using the kitcomponents in the subject methods. The instructions may be printed on asubstrate, such as paper or plastic, etc. As such, the instructions maybe present in the kits as a package insert, in the labeling of thecontainer of the kit or components thereof (i.e., associated with thepackaging or sub-packaging) etc. In other embodiments, the instructionsare present as an electronic storage data file present on a suitablecomputer readable storage medium, e.g., CD-ROM, diskette, etc.

The following examples are offered by way of illustration and not by wayof limitation.

EXPERIMENTAL I. Results and Discussion

Four different array types were used in these experiments. In the firstset of experiments, male (XY) and female (XX) genomic DNA were preparedand hybridized to an Agilent Technologies 60 mer in situ synthesizedoligonucleotide microarray (Array 1) according to the protocolsdescribed below in Materials and Methods. The arrays were prepared bythe protocol described in U.S. Pat. No. 6,444,268 (the disclosure ofwhich is herein incorporated by reference) with 60mer oligonucleotidesequences designed for the purposes of gene expression profiling studiesfor ˜7.9K genes from the Ref Seq database of the NIH. (National Centerfor Biotechnology Information U.S. National Library of Medicine, 8600Rockville Pike, Bethesda, Md. 20894)

The Cy5/Cy3 (XY/XX) ratios were calculated for features prepared fromthe X chromosome (273 features), Y chromosome (14 features) or allautosomes (7722 features). The average log 2 ratio for autosomalfeatures was 0.02 (+/−0.41) and for X chromosome features was −0.87(+/−0.61). This result compares favorably to the expected ratios of 0.0and −1.0 respectively. For the Y chromosome features present on thearray, the expected ratios are undefined but should be greater than 1because these sequences are present in the male XY DNA sample but absentin the female XX sample. Thus, the value in the denominator of thecalculated ratio is very small and potentially near zero. An average log2 ratio for Y chromosome-specific sequences of 1.30 (+/−1.80) wasobserved.

The colon carcinoma cell line Colo320 contains a well-knownamplification of the oncogene v-myc. Genomic DNA was isolated from thiscell line and hybridized to Array 1 with normal female genomic referenceDNA. The Cy5/Cy3 ratio for the v-myc nucleic acid indicated a 100-foldincrease in this gene in the cell line over the normal reference. Thisresult is within the range of v-myc amplification detected by othertechniques in published studies (Nat. Genet. 2001, 29:263-264).

To further characterize the capability of 60-mer oligonucleotidemicroarrays to detect and map regions of amplification and deletionthroughout the genome, a second array (array 2) was used to measure copynumber variations in tumor cell lines, including Colo320 and HT-29, withchromosomal abnormalities previously characterized by varioustechnologies including BAC aCGH (Snyijders et al., supra). Array 2consisted of 60-mer oligonucleotide features designed and validated forexpression profiling of more than 17,000 transcripts. Data from 97features whose reference channel signals were less than 3 standarddeviations above the average of 162 negative control feature signalintensities were discarded. In addition 5950 features that containedeither homology to more than one genomic sequence (5175 features) orspanned multiple exons (755 features) were removed, leaving 11066features including 373 on the X chromosome. Previously characterizedchromosomal aberrations detected in these cell lines included ahigh-level (log₂ratio=6.4) amplification of MYC in Colo320 (FIG. 4 a),and an amplicon spanning 8q23.1-q24.23 with a 3-fold (log₂ratio=1.5)increase in the copy number of MYC with simultaneous single copy 8pdeletion in HT-29 (FIG. 4 b).

In order to test oligonucleotide aCGH with patient samples, four softtissue sarcomas collected from 1980-2003 and accessioned via theNational Cooperative Human Tissue Network (CHTN) were screened. Thesewere hybridized to Array 2 with normal female reference DNA. Ampliconson chromosome 12q that contained known targets of amplification insarcomas such as SAS, HDM1, and HMGIC were identified in these tumors(Table 1).

TABLE 1 Detection of Copy Number Changes in In Vivo Sarcomas Log₂Fluorescence Ratios^(a) Feature Sarcoma Samples Gene Locus ST103 ST112ST130 ST240 SAS 12q13.3 0.36 3.50 2.29 1.04 DYRK2 12q14.3 −0.05 0.182.21 3.64 HMGIC 12q14.3 0.39 2.75 3.84 1.41 HDM1 12q15 −0.48 2.08 2.113.82 IFNG 12q15 −0.58 −1.14 0.75 3.62 ^(a)Log₂ ratios of Cy5/Cy3background subtracted dye normalized fluorescence signals for individual60-mer fetures that map to the indicated gene locus.

Array 3 contained features representing unique genomic sequences thatspanned the X chromosome at an average spacing of 75 kb. These samearrays contained 1653 features representing unique genomic sequences atapproximately 50 kb average spacing along chromosome 18q to assess theability to detect and map intrachromosomal single copy losses in the18q-syndrome patient derived cell lines. DNA from diploid cell linesderived from 18q-syndrome patients (one XX and three XY) containingcytogenetically mapped deletions of chromosome 18q (U.S. Pat. No.6,465,182) was cohybridized with normal XX reference DNA to Array 3. Forthis array 38 of 2116 features from the X chromosome and 14 of 1653features from 18q were considered insignificant as they had mean signalsless than a value of the background level plus three standard deviationsof the negative control features in the reference channel in at least 6of 11 hybridizations. Examination of the ratios of the X chromosomefeatures from duplicate hybridizations from the three XY and one XX18q-syndrome patients revealed an average median log 2 ratio value of−0.68 for XY/XX and −0.04 for XX/XX (FIG. 5). The best threshold betweenthe X chromosome feature ratios that differentiates between XY/XX andXX/XX permits a feature-by-feature call rate for XY versus XX of greaterthan 85%. In each of these cell lines single copy loss on 18q wasdetected and the breakpoint region was localized visually andnumerically within the known cytogenetic band location (FIG. 6)(Silverman et al., Human Genet. (1995) 56: 926-937). Breakpoints within120 kb of the initial assays were observed in duplicate hybridizations,indicating a high level of reproducibility. To test our ability todetect and measure homozygous deletions, we utilized Array 4, whichcontains 5464 features spanning chromosome 16 with feature contentbiased toward expressed gene regions. We cohybridized to them genomicDNA from the well characterized colon carcinoma cell line HCT116 thatcontains homozygous deletions at 16p12 and 16q23 (Snijders et al., Nat.Genet. (2001) 29(3):p. 263-4; Paige et al., Cancer Res. (2000)60:p1690-7) with normal female DNA (FIG. 7). We observed two regions onchromosome 16 with CGH ratios consistent with areas of homozygousdeletion and were able to localize these deletions to single gene loci:16p12.2 (deletion A) A2BP1; 16q23.1 (deletion B) WWOX.

The above results demonstrate that in situ synthesized 60-meroligonucleotide arrays can reproducibly detect genomic lesions,including single copy and homozygous deletions, as well as variableamplicons, using whole genomes from a variety of tissue sources astargets. Given the unsurpassed design flexibility inherent inoligonucleotide arrays, the demonstration here that they can be used tocharacterize copy number abnormalities in non-reduced complexity samplesshows that this technology will emerge as a standard tool for researchand diagnostics of cancer and genetic disease, among other applications.

II. Materials and Methods A. Genomic DNA.

Genomic DNA from normal male 46,XY and normal female 46,XX (DNA fromPromega (Madison Wis.) was obtained. Cell lines: 47,XXX ((repositorynumber GM04626), 48,XXXX (GM01415D), 49,XXXXX ((GM05009C) and the 18qdeletion syndrome cell lines (GM16447, 16449, 16451, 16453, 16455, and50122) are part of the NIGMS Human Genetic Cell Repository and wereobtained from the Coriell Institute of Medical Research. The coloncarcinoma lines (COLO 320DM, HT 29 and HCT-116) and the breast carcinomacell lines (MDA-MB-231 and MDA-MB-453) were obtained from American TypeCulture Collection. Each cell line was grown under conditions specifiedby the supplier. Genomic DNA was prepared from each cell line using theDNeasy Tissue Kit (Qiagen, Germantown, Md.). Tumor biopsies werecollected from 1980-2003, accessioned via the National Cooperative HumanTissue Network (CHTN). Total cellular DNA was isolated from fresh frozentumor specimens using standard TRIzol Reagent (Invitrogen, Gaithersburg,Md.) extraction techniques and further purified with standardchloroform-phenol extraction techniques.

B. Summary of Arrays 1 to 4

Number of Relevant Array Design Features (60mers) Array 1 Designed forgene Total: 7900 expression Chromosome X: 273 Chromosome Y: 14 Array 2Designed for gene Total: 17000 expression (Agilent Chromosome X: 373Human 1A Oligo Array) Chromosome 8: 348 Chromosome 12: 399 Array 3Custom array design Total: 22000 Chromosome X: 2116 Chromosome 18: 1653Array 4 Custom array design Total: 22000 Chromosome 16: 5464

C. Sample Labeling.

For each CGH hybridization, 20 μg of genomic DNA from the reference(46,XX female) and the corresponding experimental sample with Alul (12.5units) and Rsal (12.5 units) (Promega) was digested. All digests weredone for a minimum of 2 hours at 37° C. then verified by agarose gelanalysis. Individual reference and experimental samples were thenfiltered using the Qiaquick is PCR Cleanup Kit (Qiagen). Labelingreactions were performed with 60 μg of purified restricted DNA and aBioprime labeling kit (Invitrogen) according to the manufacturer'sdirections in a 50 μl volume with a modified dNTP pool; 120 μM each ofdATP, dGTP, dTTP, 60 μM dTTP, and 60 μM of either Cy5-dUTP for theexperimental sample or Cy3-dUTP for the 46,XX female reference(Perkin-Elmer, Boston, Mass.). Labeled nucleic acids were subsequentlyfiltered using a Centricon YM-30 filter (Millipore, Bedford, Mass.).Experimental and reference nucleic acids for each hybridization werepooled, mixed with 50 pg of human Cot-1 DNA (Invitrogen), 100 μg ofyeast tRNA (Invitrogen) and 1× hybridization control nucleic acids(SP310, Operon). The nucleic acid mixture was purified then concentratedwith a Centricon YM-30 column, and resuspended to a final volume of 250μl, then mixed with an equal volume of Agilent 2X in situ HybridizationBuffer.

D. Oligonucleotide Microarray Processing.

Prior to hybridization to the arrays, the 500 μl hybridization mixtureswere denatured at 100° C. for 1.5 minutes and incubated at 37° C. for 30minutes. In order to remove any precipitate, the mixture was centrifugedat ≧14,000 g for 5 minutes and transferred to a new tube leaving a smallresidual volume (≦5 μl). The sample was applied to the array using anAgilent microarray hybridization chamber and hybridization was carriedout for 14-18 hrs at 65° C. in a Robbins Scientific rotating oven at 4rpm. The arrays were then disassembled in 0.5×SSC/0.005% Triton X102(wash 1) at 65° C. then washed for 10 minutes at RT in wash 1, followedby 5 minutes at RT in 0.1×SSC/0.005% Triton X102 (wash 2). Slides weredried and scanned using an Agilent 2565AA DNA microarray scanner.

E. Image and Data Analysis.

Microarray images were analyzed using Agilent Feature Extractionsoftware version 6.1.1. Default settings were used except that only60-mer features from diploid autosomal chromosomes were used for dyenormalization using the locally weighted linear regression curve fit(LOWESS) method(http://www.chem.agilent.com/temp/rad506EE/00036948.pdf.). Arrays 2,3and 4 contained replicate features for a subset of the featuresequences. For these replicate features the mean and standard deviationof background-subtracted signals was calculated in both channelsindependently after the elimination of outliers. Outlier featurerejection was based on limits of 1.5 IQR (intraquartile ranges) from themedian.

The above results and discussion demonstrate that novel methods ofperforming CGH are provided. Advantages of the subject invention thatresult from the use of immobilized oligonucleotide features include, butare not limited to: (a) the ability to employ short featureoligonulceotides that minimize cross-hybridization while maintainingmaximum hybridization affinity; (b) lower background and therefore alower limit of detection; (c) increased resolution for both coding and anon-coding regions; (d) elimination of need to screen or assay nucleicacid collections of reduced complexity, (and therefore the toelimination of the need to employ protocols that reduce complexity,e.g., by selective amplification, with their attendant possibilities ofundesired selective enhancement; (e) the ability to detect DNAalterations at virtually any site in the genome; and the like. As such,the subject methods represent a significant contribution to the art.

All publications and patent applications cited in this specification areherein incorporated by reference as if each individual publication orpatent application were specifically and individually indicated to beincorporated by reference. The citation of any publication is for itsdisclosure prior to the filing date and should not be construed as anadmission that the present invention is not entitled to antedate suchpublication by virtue of prior invention.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it is readily apparent to those of ordinary skill in theart in light of the teachings of this invention that certain changes andmodifications may be made thereto without departing from the spirit orscope of the appended claims.

1-49. (canceled)
 50. A method for comparing the copy number of at leastone nucleic acid sequence in at least two genomic sources, said methodcomprising: (a) preparing at least a first collection of nucleic acidmolecules from a first genomic source and a second collection of nucleicacid molecules from a second genomic source, wherein said first andsecond genomic sources have a complexity of 1×10⁹ base pairs or more andsaid first and second collections are of non-reduced complexity; (b)contacting said first and second collections of nucleic acid moleculeswith one or more pluralities of distinct oligonucleotide features boundto a surface of a solid support, wherein said one or more pluralities ofdistinct oligonucleotide features comprise oligonucleotides that rangein size from 20 nt to 200 nt in length and said contacting occurs understringent assay conditions; (c) measuring the binding of the first andsecond collections of nucleic acid molecules to said features to producedata; and (d) identifying a quantitative difference in the copy numberof at least one nucleic acid sequence in said at least two genomicsources using said data.
 51. The method according to claim 50, whereinsaid oligonucleotides range from 20 to 100 nt in length.
 52. The methodaccording to claim 51, wherein said oligonucleotides range from 50 to 90nt in length.
 53. The method according to claim 50, wherein the firstand second genomic sources comprise mammalian genomic DNA.
 54. Themethod according to claim 50, wherein the first and second genomicsources comprise human genomic DNA, and said oligonucleotides range from20 to 100 nt in length.
 55. The method according to claim 50, whereinsaid plurality of oligonucleotide features bound to a surface of a solidsupport include sequences of locations distributed across at least aportion of a chromosome.
 56. The method according to claim 55, whereinsaid locations have a non-uniform spacing across at least a portion ofthe chromosome.
 57. The method according to claim 50, wherein saidnucleic acids of said first and second collections range from 100 to10000 nt in length.
 58. The method according to claim 50, wherein saidcollections of nucleic acids are contacted with the same plurality ofdistinct oligonucleotide features.
 59. The method according to claim 50,wherein said collections of nucleic acids are distinguishably labeled.60. The method according to claim 50, wherein the solid support is aplanar substrate.
 61. The method according to claim 50, wherein saidnon-reduced complexity collections have a complexity that is at least25% of their respective genomic sources.
 62. The method according toclaim 50, wherein said non-reduced complexity collections have acomplexity that is at least 50% of their respective genomic sources. 63.The method according to claim 50, wherein said plurality of distinctoligonucleotide features are bound to a solid surface in an array. 64.The method according to claim 50, wherein each of said first and secondcollections is prepared by a primer extension reaction using a set ofrandom primers and using said genomic sources as genomic templates. 65.The method according to claim 50, wherein said preparing comprisesamplifying said first genomic source and second genomic source in amanner such that said first and second collections are of non-reducedcomplexity.
 66. The method according to claim 50, wherein said first andsecond genomic sources are obtained from a plant.
 67. The methodaccording to claim 50, wherein step (d) comprises identifying a singlecopy number heterozygous deletion in a genomic region of said first orsecond genomic sources.
 68. A method for comparing the copy number of atleast one nucleic acid sequence in at least two genomic sources, saidmethod comprising: (a) preparing at least a first collection of nucleicacid molecules from a first genomic source and a second collection ofnucleic acid molecules from a second genomic source, wherein said firstand second genomic collections have a complexity of 1×10⁸ base pairs ormore; (b) contacting said first and second collections of nucleic acidmolecules with one or more pluralities of distinct oligonucleotidefeatures bound to a surface of a solid support, wherein said one or morepluralities of distinct oligonucleotide features compriseoligonucleotides that range in size from 20 nt to 200 nt in length andsaid contacting occurs under stringent assay conditions; (c) measuringthe binding of the first and second collections of nucleic acidmolecules to said features to produce data; and (d) identifying aquantitative difference in the copy number of at least one nucleic acidsequence in said at least two genomic sources using said data.
 69. Themethod of claim 68, wherein the first and second collections have acomplexity that is at least 25% of their respective genomic sources.